Add count-trailing-zero libcalls #597

calc84maniac · 2025-04-28T04:00:13Z

I've implemented optimized count-trailing-zero libcalls, and I have an ez80-clang commit ready to push to make it use them. They've also been added to ez80_builtin implementations where applicable, and tested via ez80_builtin/stdbit tests.

runer112 · 2025-04-28T04:54:23Z

src/crt/cttz.src

+	tst	a, 030h
+	jr	z, .high2
+	dec	a
+	and	a, 014h
+	ret	po
+	ld	a, 5
+	ret
+.high2:
+	add	a, a
+	sbc	a, -8
+	ret	p
+	ld	a, 6
+	ret


This unfortunately breaks the optimization of the first trailing one functions relying on this to produce an output of 8 for an input of 0, but it might be worth it?

Suggested change

tst a, 030h

jr z, .high2

dec a

and a, 014h

ret po

ld a, 5

ret

.high2:

add a, a

sbc a, -8

ret p

ld a, 6

ret

add a, a

add a, a

jr z, .high2

add a, a

add a, a

sbc a, -5

ret

.high2:

sbc a, -7

ret

I guess this change would also hurt for maybe eventually implementing std::countr_zero, wouldn't it...

Unfortunately, that assumption is also made by the compiler (this libcall implements CTTZ which is output by __builtin_ctz intrinsics on the Z80 target and not CTTZ_ZERO_UNDEF). Good optimization idea though, I'll have to think whether I can do something similar.

I guess this change would also hurt for maybe eventually implementing std::countr_zero, wouldn't it...

It's actually already been implemented in the toolchain recently, and makes use of the Z80 intrinsic behavior I mentioned.

It occurs to me that, except for that pesky 0 case, you could potentially rework this whole routine to be based on branches after pairs of left shifts. I think whether that would be faster or not depends on the input distribution, but it would certainly be smaller.

runer112 · 2025-04-28T05:00:42Z

src/crt/cttz.src

+	ld	a, l
+	or	a, a


Matching __icttz's first 3 bytes might compress better.

Suggested change

ld a, l

or a, a

xor a, a

or a, l

Good point, I always forget to think about compression alongside other size optimizations.

test/standalone/ez80_builtin/src/main.c

calc84maniac added the crt label Apr 28, 2025

calc84maniac force-pushed the cttz-libcalls branch from 2eae7db to a61a932 Compare April 28, 2025 04:05

calc84maniac temporarily deployed to Autotester April 28, 2025 04:05 — with GitHub Actions Inactive

calc84maniac requested review from runer112 and mateoconlechuga April 28, 2025 04:11

Add count-trailing-zero libcalls

98bec4b

calc84maniac force-pushed the cttz-libcalls branch from a61a932 to 98bec4b Compare April 28, 2025 04:30

calc84maniac temporarily deployed to Autotester April 28, 2025 04:30 — with GitHub Actions Inactive

calc84maniac deployed to Autotester April 28, 2025 04:30 — with GitHub Actions Active

calc84maniac temporarily deployed to Autotester April 28, 2025 04:30 — with GitHub Actions Inactive

runer112 reviewed Apr 28, 2025

View reviewed changes

runer112 added the libc label Apr 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add count-trailing-zero libcalls #597

Add count-trailing-zero libcalls #597

calc84maniac commented Apr 28, 2025

runer112 Apr 28, 2025

runer112 Apr 28, 2025 •

edited

Loading

calc84maniac Apr 28, 2025

calc84maniac Apr 28, 2025

runer112 Apr 28, 2025 •

edited

Loading

runer112 Apr 28, 2025

calc84maniac Apr 28, 2025

-	ld	a, l
-	or	a, a
+	xor	a, a
+	or	a, l

Add count-trailing-zero libcalls #597

Are you sure you want to change the base?

Add count-trailing-zero libcalls #597

Conversation

calc84maniac commented Apr 28, 2025

runer112 Apr 28, 2025

Choose a reason for hiding this comment

runer112 Apr 28, 2025 • edited Loading

Choose a reason for hiding this comment

calc84maniac Apr 28, 2025

Choose a reason for hiding this comment

calc84maniac Apr 28, 2025

Choose a reason for hiding this comment

runer112 Apr 28, 2025 • edited Loading

Choose a reason for hiding this comment

runer112 Apr 28, 2025

Choose a reason for hiding this comment

calc84maniac Apr 28, 2025

Choose a reason for hiding this comment

runer112 Apr 28, 2025 •

edited

Loading

runer112 Apr 28, 2025 •

edited

Loading